56 research outputs found
Estimating Example Difficulty using Variance of Gradients
In machine learning, a question of great interest is understanding what
examples are challenging for a model to classify. Identifying atypical examples
helps inform safe deployment of models, isolates examples that require further
human inspection, and provides interpretability into model behavior. In this
work, we propose Variance of Gradients (VOG) as a proxy metric for detecting
outliers in the data distribution. We provide quantitative and qualitative
support that VOG is a meaningful way to rank data by difficulty and to surface
a tractable subset of the most challenging examples for human-in-the-loop
auditing. Data points with high VOG scores are more difficult for the model to
classify and over-index on examples that require memorization.Comment: Accepted to Workshop on Human Interpretability in Machine Learning
(WHI), ICML, 202
Intriguing generalization and simplicity of adversarially trained neural networks
Adversarial training has been the topic of dozens of studies and a leading
method for defending against adversarial attacks. Yet, it remains unknown (a)
how adversarially-trained classifiers (a.k.a "robust" classifiers) generalize
to new types of out-of-distribution examples; and (b) what hidden
representations were learned by robust networks. In this paper, we perform a
thorough, systematic study to answer these two questions on AlexNet, GoogLeNet,
and ResNet-50 trained on ImageNet. While robust models often perform on-par or
worse than standard models on unseen distorted, texture-preserving images (e.g.
blurred), they are consistently more accurate on texture-less images (i.e.
silhouettes and stylized). That is, robust models rely heavily on shapes, in
stark contrast to the strong texture bias in standard ImageNet classifiers
(Geirhos et al. 2018). Remarkably, adversarial training causes three
significant shifts in the functions of hidden neurons. That is, each
convolutional neuron often changes to (1) detect pixel-wise smoother patterns;
(2) detect more lower-level features i.e. textures and colors (instead of
objects); and (3) be simpler in terms of complexity i.e. detecting more limited
sets of concepts
Towards a Unified Framework for Fair and Stable Graph Representation Learning
As the representations output by Graph Neural Networks (GNNs) are
increasingly employed in real-world applications, it becomes important to
ensure that these representations are fair and stable. In this work, we
establish a key connection between counterfactual fairness and stability and
leverage it to propose a novel framework, NIFTY (uNIfying Fairness and
stabiliTY), which can be used with any GNN to learn fair and stable
representations. We introduce a novel objective function that simultaneously
accounts for fairness and stability and develop a layer-wise weight
normalization using the Lipschitz constant to enhance neural message passing in
GNNs. In doing so, we enforce fairness and stability both in the objective
function as well as in the GNN architecture. Further, we show theoretically
that our layer-wise weight normalization promotes counterfactual fairness and
stability in the resulting representations. We introduce three new graph
datasets comprising of high-stakes decisions in criminal justice and financial
lending domains. Extensive experimentation with the above datasets demonstrates
the efficacy of our framework.Comment: Accepted to UAI'2
- …